\documentclass[11pt,a4paper]{article}

\usepackage{url,,}
\usepackage{graphicx}
\usepackage{hyperref}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsmath}
\usepackage{multirow}
\usepackage{listings}
\usepackage{fullpage}
\usepackage{fancyhdr,a4wide}
\usepackage{makeidx}
\usepackage{placeins}
%\usepackage[procnames,noindent]{lgrind}

\lstset{ %
language=VHDL,                % choose the language of the code
basicstyle=\footnotesize,       % the size of the fonts that are used for the code
showstringspaces=false,         % underline spaces within strings
%numbers=left,                   % where to put the line-numbers
%numberstyle=\footnotesize,      % the size of the fonts that are used for the line-numbers
%stepnumber=1,                   % the step between two line-numbers. If it's 1 each line will be numbered
%numbersep=5pt,                  % how far the line-numbers are from the code
%backgroundcolor=\color{white},  % choose the background color. You must add \usepackage{color}
showspaces=false,               % show spaces within strings adding particular underscores
showtabs=false,                 % show tabs within strings adding particular underscores
escapeinside={\%*}{*)}          % if you want to add a comment within your code
}

\begin{document}	

\begin{titlepage}

\thispagestyle{fancy}
\lhead{}
\chead{
\large{\textit{
Informatics and Mathematical Modelling\\
Technical University of Denmark}}}
\rhead{}
\rule{0pt}{50pt}
\vspace{3cm}

\begin{center}
 	\huge{\textbf{02207 : Advanced Digital Design Techniques}}\\
 	\vspace{1cm}
 	\huge{Low-pass Filter (2 x 1-D)}\\
 	\vspace{1cm}
 	\huge{\textit{Examination Project}}\\
 	\vspace{1cm}
 	\huge{Group \textit{dt07}}\\
\end{center}

\vspace{4cm}

\begin{flushright}
	\LARGE{Markku Eerola (s053739)}\\
	\vspace{0.3cm}
	\LARGE{Rajesh Bachani (s061332)}\\
	\vspace{0.3cm}
	\LARGE{Josep Renard (s071158)}\\
\end{flushright}
\cfoot{\today}
\end{titlepage}

\newpage 
\tableofcontents

\newpage

\section{About the Report}
In this project, we have designed and implemented the architecture for an image filtering processor which uses a 3x3 filter to perform a convolution on the image of size 256x256 pixels.

Following is the work done by the authors.
\paragraph{Authors by Section}
\begin{itemize}
\item \textit{Rajesh Bachani} 
	\begin{itemize}
		\item VHDL: FSM\_In.vhd, FSM\_Out.vhd, Processor\_3.vhd, TB\_Filter.vhd
		\item Report: Sections \ref{sec:sequencing} and \ref{sec:controller}
	\end{itemize}
\item \textit{Josep Renard} 
	\begin{itemize}
		\item VHDL: Multiplier.vhd, parcial.vhd, CRA\_15.vhd, CSA\_15bit.vhd
		\item Report: Sections \ref{sec:results} and \ref{sec:limitations}
	\end{itemize}
\item \textit{Markku Eerola}
	\begin{itemize}
		\item VHDL: Adder\_2.vhd, Adder\_3.vhd, Mux\_4.vhd, SHIFTREG.vhd, Memory.vhd, REG.vhd
		\item Report: Sections \ref{sec:design} and \ref{sec:synthesis}
	\end{itemize}

\end{itemize}

The rest of the report is organized as follows. In section \ref{sec:design}, we explain the internal architecture of the processor. Then in section \ref{sec:sequencing}, we give the sequencing of the operations involved in the computation. This section explains the operations performed for memory initialization and the order in which the input memory is read and the output memory is read and written. Also, we here explain the order in which the memory is accessed, which are different for the horizontal and vertical movements of the filter mask. Section \ref{sec:controller} explains the design of the controllers at the input and the output. In section \ref{sec:synthesis}, we provide the results from the synthesis of the design in Design Vision. The section \ref{sec:results} contains the images obtained after convolution, and also a summary of the results from the synthesis. Finally, we end the report with section \ref{sec:limitations} with a short explanation on what we think are the limitations of the work done here, and how it could be extended.

\section{Design Architecture}
\label{sec:design}
The overall design of the filter unit can be seen in figure \ref{fig:proc}. More detailed architecture can be seen in figure \ref{fig:procdetail}.

\begin{figure}[h]
	\centering
		\includegraphics[width=6in]{./images/processador.PNG}
	\caption{Filter Unit Design}	\label{fig:proc}
\end{figure}

\begin{figure}[h]
	\centering
		\includegraphics[width=6in]{./images/insideproc.PNG}
	\caption{Processor Architecture}	\label{fig:procdetail}
\end{figure}

\FloatBarrier

The input and the output signals of the processor are explained below, with their bit-width's.

\begin{itemize}
		\item \textit{Data\_in\_1}, 8 bits, transfers pixels of the image from the input memory to the processor's cache
		\item \textit{Clock}, 1 bit, the common clock signal
		\item \textit{Reset}, 1 bit, the common active low reset signal
		\item \textit{Filter}, 8 bits, used to get the $n$ pixels of the filter, 8 bits over $n$ clock cycles ($n$ = 3 here)
		\item \textit{Disable\_filter}, 1 bit, used to disable the filter, so the filter register is not written
		\item \textit{Data\_in\_2}, 8 bits, transfers pixels of the image from the output memory to the processor
		\item \textit{Read\_In\_Mem}, 1 bit, indicates the memory when to read pixels of the input image
		\item \textit{Write\_In\_Mem}, 1 bit, indicates the memory when to write pixels to the input image (this is used only when the input memory is initialized)
		\item \textit{Read\_Addr\_In\_Mem}, 16 bits, indicates the address from where the next pixel should be read from the input memory
		\item \textit{Data\_Out}, 8 bits, transfers processed pixels from the processor to the output memory
		\item \textit{Write\_Addr\_Out\_Mem}, 16 bits, indicates the address to which the next pixel should be written in the output memory
		\item \textit{Read\_Addr\_Out\_Mem}, 16 bits, indicates the address from where the next pixel must be read from the output memory
		\item \textit{Write\_Out\_Mem}, 1 bit, indicates the output memory when to allow writing of data
		\item \textit{Read\_Out\_Mem}, 1 bit, indicates the output memory when to allow reading of data
\end{itemize}


\section{Sequencing of Operations}
\label{sec:sequencing}

This section contains a description of the sequence in which the processor performs the operations needed for convolution of the image. 

\subsection{Memory Initialization}

Before the processor starts its operations, we have the memory initialization step, in which both the input and the output memory's are set to initial values. The input memory is initialized with the image pixels, which is done through the testbench. The output memory has to be initialized to zero.

\subsection{Memory Read and Write by Processor}
\label{sec:memreadwrite}

There are two controllers as part of the processor, the Input Controller and the Output Controller (indicated by FSM\_In and FSM\_Out in figure \ref{fig:procdetail}). The Input Controller is responsible for reading pixels from the input memory, which holds the original image. On the other hand, the output controller is responsible for reading as well as writing the computed pixels from and to the output memory. The output memory at the end of the computation, holds the pixels of the convoluted image. 

It is very important that these two controllers be well synchronized with each other, so the operations are performed smoothly, and there is no data loss. In particular, when the Input Controller is active, the Output Controller should not perform any operation. This is because until the Input Controller has read the next 3 pixels (for the 3x3 filter; it would be $n$ for nxn filter) from the input memory, the convolution is not stable, and so the Output Controller cannot write anything to the output memory. The vice-versa case is also true. So, when the Output Controller is active, the Input Controller should be inactive. This is because while the output memory is being written by the Output Controller, if the Input Controller reads new pixels, then the already computed values would be overwritten, and the synchronization is disturbed completely.

\begin{figure}[h]
	\centering
		\includegraphics[width=6in]{./images/controllerreadwrite.PNG}
	\caption{Read Write Synchronization between Input and Output Controllers(1)}	
	\label{fig:controllerreadwrite}
\end{figure}
\FloatBarrier

Hence we have chosen the approach in which at no point of time, would both the Controllers be active, as shown in figure \ref{fig:controllerreadwrite}. As we can see, the signal \textit{read\_in\_mem} is \textit{1} for three clock cycles. The signal \textit{disable\_to\_cache} becomes \textit{1} after one clock cycle delay as compared to \textit{read\_in\_mem}. Also, once the \textit{read\_in\_mem} becomes \textit{0}, the \textit{read\_out\_mem} becomes \textit{1}. Also at this time, the \textit{disable\_to\_cache} becomes \textit{0}. Then there is no activity for one clock cycle, during which the convolution is done, after which \textit{write\_out\_mem} becomes \textit{1} to write the convoluted pixel to the output memory. The \textit{read\_in\_mem} then becomes \textit{1} again after nine clock cycles, i.e. when the Output Controller has finished reading and writing three convoluted pixels to the output memory.

During the horizontal movement, when the horizontal end of the image is reached, the Output Controller becomes idle for a longer duration. This is because the Input Controller now has to load nine fresh pixels from the input memory, which happens in 27 clock cycles. So, once the end of the image is reached (this holds even for vertical movements), the Output Controller becomes idle for 27 cycles, and then starts reading and writing as usual, which goes on uptil the end of the image is again reached (which is 256 rounds of read and write). Figure \ref{fig:controllerreadwrite2} depicts this situation in the simulation.
 
\begin{figure}[h]
	\centering
		\includegraphics[width=6in]{./images/controllerreadwrite2.PNG}
	\caption{Read Write Synchronization between Input and Output Controllers(2)}	
	\label{fig:controllerreadwrite2}
\end{figure}



\subsection{Memory Access Sequence}
\label{sec:memoryaccess}

\begin{figure}[h]
	\centering
		\includegraphics[width=5in]{./images/memoryAccess.PNG}
	\caption{Sequence of Input Memory Address Access}	
	\label{fig:memaccess}
\end{figure}

\FloatBarrier

The figure \ref{fig:memaccess} shows the sequence in which the input memory pixels are accessed by the Input Controller. During the horizontal movement, the 3 pixels are accessed in every column, with the direction of movement of the access being the same as the direction of movement of the filter, i.e. from left to right. Once a horizontal end of the image is reached, the access is started from the next row, with again three pixels in every column and the direction of movement being horizontal. 

When the horizontal movement is completed, the vertical movement takes place. During this, three pixels are accessed in every row of the image, and the movement is vertical, from top to bottom. On reaching a vertical end of the image, the reading starts at the top of the next column, again with 3 pixels in each row and movement from top to bottom.

\begin{figure}[h]
	\centering
		\includegraphics[width=5in]{./images/outmemoryaccess.jpg}
	\caption{Sequence of Output Memory Address Access}	
	\label{fig:outmemaccess}
\end{figure}
\FloatBarrier

The memory access for the Output Controller is simple as compared to the memory access by the Input Controller. The right part of the figure \ref{fig:outmemaccess} shows the access for the horizontal movement. The top most and the bottom most rows are not written. Once a row is completed the next row pixels are read and written.

For the vertical movement, the first and the last columns are not written. The read and write sequence starts from the second column, and ends when the second last column is finished.


\section{Finite State Machines}
\label{sec:controller}

The following subsections explain the finite state machines describing the Input and the Output Controllers.

\subsection{Input Controller}

The Input Controller has the following 16 states.
\begin{enumerate}
\item State \textit{init}. This is the initialization state of the controller. The next state is \textit{init\_in\_memory\_1}.

\item State \textit{init\_in\_memory\_1}. This state performs the initialization of the input memory. The Controller puts the  \textit{can\_write} signal to \textit{1}, so that the byte read from the $.hex$ file is written to the \textit{Data\_In} of the input memory. The address to which the byte should be written is also given by the Controller, through the signal \textit{address}. The next state is \textit{init\_in\_memory\_2}.

\item State \textit{init\_in\_memory\_2}. This state does the same thing as the previous state, and it increments a counter. Once the value of the counter is greater than 65536, which means that the entire memory has been written, the state changes to \textit{h\_read\_1}.

\item State \textit{h\_read\_1}. This state performs the read operation from the input memory for one pixel. The signal \textit{can\_write} is set to \textit{1}. The next state is \textit{h\_read\_2}.

\item State \textit{h\_read\_2}. This state performs the read operation from the input memory for one pixel. The next state is \textit{h\_read\_3}. The signal \textit{disable\_cache} is set to \textit{0} here, since from now we want the cache to start loading the values from the input memory. 

\item State \textit{h\_read\_3}. This state performs the read operation from the input memory for one pixel. The next state is \textit{h\_cache}.

\item State \textit{h\_cache}. This state is used as a delay. The signal \textit{disable\_cache} should be delayed by one clock cycle as compared to the signal \textit{can\_read} since the byte from the memory comes one clock cycle after the \textit{can\_read} is active. The signal \textit{can\_read} is set to \textit{0} here. The next state is \textit{h\_wait}.

\item State \textit{h\_wait}. This state is used as a wait state, during which the Output FSM is active. Also, the \textit{disable\_cache} signal is set to \textit{1} here, so that no more values from the memory are read into the cache, until the Input Controller gets active again. The next state is \textit{h\_temp}.

\item State \textit{h\_temp}. This state is also the wait state, and the finite state machine keeps shuttling between this state and the previous state, for 27 clock cycles. Once the Output Controller writes the new pixels to the output memory, the wait time is over, and the next state is set to \textit{v\_read\_1} and the vertical reading for the memory is started.

The states for vertical movement are kept separate from the states for the horizontal movement. This is because the order in which the addresses are generated for the \textit{address} signal, are different in both the movements (as also shown in figure \ref{fig:memaccess}). The purpose of the following states is the same though, with only the address values being different, so we skip the explanation.
\item State \textit{v\_read\_1}. 

\item State \textit{v\_read\_2}. 

\item State \textit{v\_read\_3}. 

\item State \textit{v\_cache}. 

\item State \textit{v\_wait}. 

\item State \textit{v\_temp}.

\item State \textit{exit\_in}. This is the exit state of the finite state machine.

\end{enumerate}


\subsection{Output Controller}
The Output Controller has the following 18 states.
\begin{enumerate}
\item State \textit{init}. This is the initialization state of the controller. The next state is \textit{init\_out\_memory\_1}.

\item State \textit{init\_out\_memory\_1}. This state initializes the output memory to \textit{0}. The controller puts the \textit{can\_write} signal to \textit{1} and the \textit{write\_address} is incremented every time in the state. The next state is \textit{init\_out\_memory\_2}.

Important to note here, that the output memory is initialized by zeros, since the \textit{Data\_In\_2} signal coming from the output memory is set to zero, using the \textit{give\_zeros} signal of the output memory. Also the multiplexer in this state gives zero since the select is forced to `11' by the controller. Hence the \textit{Data\_Out} of the processor is always zero in this state.

\item State \textit{init\_out\_memory\_2}. This state performs the same function as the previous state. A counter is maintained which if greater than 65536 indicates that the memory is initialized. Then the next state is \textit{h\_init\_1}

\item State \textit{h\_init\_1}. This state is created in order to wait for the cache shift register to get the pixels from the input memory. Actually, when the Input Controller finishes reading a line in the memory, during any of the horizontal or vertical movements, the Output Controller must wait for the time till the cache is filled with the new 9 pixels. The next state is \textit{h\_init\_2}.

\item State \textit{h\_init\_2}. This state performs the same function, and waits till the 9 pixels are filled in the cache shift register. This takes 27 clock cycles, since the Input Controller also remains idle in between. Once this is done, the next state is set to \textit{h\_read\_1}. 

\item State \textit{h\_read\_1}. This state puts the \textit{can\_read} signal to \textit{1}. Also, the corresponding \textit{read\_address} is set. The next state is \textit{h\_read\_write}.

\item State \textit{h\_read\_write}. In this state, the Controller remains idle, so that data is recieved from the output memory in the next clock cycle. The next state is \textit{h\_write\_1}. 

\item State \textit{h\_write\_1}. This state forces the adder to be selected, by changing the \textit{sel} signal. The old pixel from the output memory and the new pixel from the adder are added. The signal \textit{can\_write} is set to \textit{1} and the \textit{write\_address} is set to the same value as the \textit{read\_address} in the previous state. If the end of the row or column is reached in the memory, the next state is \textit{h\_init\_1}. Else the next state is \textit{h\_wait\_1}. Also, if the end of the image is identified, then the vertical movement begins, and in that case the next state is set to \textit{v\_init\_1}

\item State \textit{h\_wait\_1}. In this state, the Controller waits for the Input Controller to read 3 new pixels from the input memory. The next state is \textit{h\_wait\_2}.

\item State \textit{h\_wait\_2}. This state performs the same function as the previous state. If 3 clock cycles are over, i.e. if the Input Controller has read 3 new pixels, then the Controller gets active again, and the next state is set to \textit{h\_read\_1}.

Again, we avoid giving explanation for the states during the vertical movement, since they all perform the same function as the states occuring during the horizontal movement.
\item State \textit{v\_init\_1}. 

\item State \textit{v\_init\_2}. 

\item State \textit{v\_read\_1}. 

\item State \textit{v\_read\_write}. 

\item State \textit{v\_write\_1}. 

\item State \textit{v\_wait\_1}. 

\item State \textit{v\_wait\_2}.

\item State \textit{exit\_in}. This is the exit state of the finite state machine.

\end{enumerate}

\section{Synthesis}
\label{sec:synthesis}
We synthesized the design using four different clock periods, namely 7ns, 5ns, 3ns and 2ns, and let Design Vision try to optimize the design for speed to get the fastest possible design. Turns out 2ns is the minimum clock period for our design, Design Vision was not able to synthesize a faster design even when we tried. To get meaningful power reports we simulated switching activity with the VSS Simulator and the activity was passed on to Design Vision. On top of power reports we also obtained area and timing reports from the design on all four clock periods. The actual reports can be seen in the appendix, but a summary of the results can be seen in table \ref{tab:synth}. The power breakdown for the designs on all four clock periods can be seen in table \ref{tab:broken}. In the breakdown \textit{MULT}$_{conv}$ and \textit{ADD}$_{conv}$ refer to the power dissipation in all multipliers and adders involved in the convolution combined. For more detailed breakdown please refer to the appendix.

\begin{table}[h]
	\caption{Summary of Design Vision reports}
	\begin{center}
		\begin{tabular}{|l|l|l|l|l|l|l|} \hline
			\textbf{T}$_{C}$ [ns]	& \textbf{P}$_{stat}$	[mW] & \textbf{P}$_{dyn}$ [mW]	& \textbf{P}$_{tot}$ [mW] & \textbf{A}$_{comb}$ [$\mu$$m^2$]& \textbf{A}$_{tot}$ [$\mu$$m^2$] & \textbf{T}$_{cp}$ [ns] \\ \hline
			7 & 0.16 & 1.77 & 1.93 & 44067 & 53079 & 4.7 \\ \hline
			5 & 0.16 & 1.88 & 2.04 & 44067 & 53079 & 4.7 \\ \hline
			3 & 0.11 & 2.17 & 2.28 & 49595 & 58611 & 2.9 \\ \hline
			2 & 0.19 & 2.59 & 2.78 & 58668 & 67700 & 1.9 \\ \hline
		\end{tabular}
	\end{center}
	\label{tab:synth}
\end{table}

\begin{table}[h]
	\caption{Power breakdown, total power in mW}
	\begin{center}
		\begin{tabular}{|p{0.4cm}|p{1.15cm}|p{1.25cm}|p{1.5cm}|p{1.55cm}|p{1.75cm}|p{1.45cm}|p{1cm}|p{1.3cm}|p{1.25cm}|} \hline
			\textbf{T}$_{C}$ & \textbf{FSM}$_{in}$ & \textbf{FSM}$_{out}$ & \textbf{REG}$_{filter}$ & \textbf{REG}$_{cache}$ & \textbf{MULT}$_{conv}$ & \textbf{ADD}$_{conv}$ & \textbf{MUX} & \textbf{ADD}$_{out}$ & \textbf{REG}$_{out}$ \\ \hline
			7 & 0.207 & 0.242 & 0.396 & 0.547 & 0.363 & 0.036 & 0.035 & 0.050 & 0.053 \\ \hline
			5 & 0.256 & 0.300 & 0.396 & 0.549 & 0.365 & 0.036 & 0.036 & 0.056 & 0.052 \\ \hline 
			3 & 0.370 & 0.436 & 0.401 & 0.548 & 0.224 & 0.110 & 0.040 & 0.126 & 0.052 \\ \hline
			2 & 0.512 & 0.606 & 0.395 & 0.557 & 0.371 & 0.103 & 0.044 & 0.143 & 0.052 \\ \hline
		\end{tabular}
	\end{center}
	\label{tab:broken}
\end{table}

The critical path of the design (see figure \ref{fig:procdetail}) is the same for all four clock periods and goes through the cache register, the convolution multipliers, the convolution adders, the multiplexer, the adder and the output register (added for the synthesizer's sake, to constrain the path - doesn't contribute to the delay in any of the timing reports). The path is illustrated in figure \ref{fig:critical} along with the delay information.

\begin{figure}[h]
	\centering
		\includegraphics[width=6in]{./images/criticalpath.PNG}
	\caption{Critical path of the design}	\label{fig:critical}
\end{figure}

\FloatBarrier

\section{Results}
\label{sec:results}

The original image which is convoluted in the simulation is shown in figure \ref{fig:lena}. The convolution of this image takes a total time of 3252224 ns, which is broken down as (with a clock cycle of 2 ns):

\begin{enumerate}
\item Memory Initialization: 131072 ns (256 * 256 * 2 ns).
\item Horizontal Movement: 1560576 ns (256 * 254 * 12 * 2 ns), where 12 is the number of cycles taken for one horizontal movement of the filter.
\item Vertical Movement: 1560576 ns (256 * 254 * 12 * 2 ns), where 12 is the number of cycles taken for one vertical movement of the filter.
\end{enumerate}

The left image in the figure \ref{fig:lenanew} is computed using the filter mask of `000010000' and the right image in figure \ref{fig:lenanew} is computed using the filter mask of `010101010'.

\begin{figure}[h]
	\centering
		\includegraphics[width=2in]{./images/lena.PNG}
	\caption{Original Image}	
	\label{fig:lena}
\end{figure}

\FloatBarrier

\begin{figure}[htp]
  \begin{center}
    \subfigure{\label{fig:lena1}\includegraphics[height=2in]{./images/lena1.PNG}}
    \hspace{1in}
    \subfigure{\label{fig:lena2}\includegraphics[height=2in]{./images/lena2.PNG}}
  \end{center}
  \caption{Convolution Results}
  \label{fig:lenanew}
\end{figure}

\FloatBarrier

Provided below is a summary of the results obtained from the Synthesis of the design.

\begin{table}[h]
	\caption{Summary}
	\begin{center}
		\begin{tabular}{|p{1cm}|p{2cm}|p{2cm}|l|l|} \hline
			\textbf{T}$_C$ [ns] & \textbf{Critical Path} [ns] & \textbf{N. cycles} (256 x 256) & \textbf{E}$_{pc}$ [mW/MHz] & \textbf{AREA} [$\mu$$m^2$] \\ \hline
			7 & 4.7 & 3121152 & 0.01351 & 53079 \\ \hline
			5 & 4.7 & 3121152 & 0.01020 & 53079 \\ \hline
			3 & 2.9 & 3121152 & 0.00684 & 58611 \\ \hline
			2 & 1.9 & 3121152 & 0.00556 & 67700 \\ \hline
		
		\end{tabular}
	\end{center}
	\label{tab:conclusion}
\end{table}

\FloatBarrier

\section{Limitations and Extensions}
\label{sec:limitations}

We have designed and implemented the architecture for an image filtering processor which uses a 3x3 filter to perform a convolution on the image of size 256x256 pixels. Though the results of the convoluted image look promising, as shown in section \ref{sec:results}, we are aware of some of the limitations of the work. Given more time, we would have liked to add the following missing aspects into the project.

\begin{enumerate}
\item We have just been able to implement the 3x3 filter for the convolution. Though, it was proposed that we would implement the higher dimension filters as well, including 5x5, 7x7 and 9x9, we were not able to do so, due to the initial problems we faced in the implementation of the 3x3 filter itself. We believe the results obtained in the section \ref{sec:results}, in the form of the convoluted image could be better if the dimension of the filter is higher. In those cases, the blur effect on the image would be clearly evident, as compared to the case of the 3x3 filter. We would like to briefly mention how the design of the processor would be modified if we wish to convolute the image using higher dimension filters. If we consider the dimension of the filter as nxn, then we have the following:
	\begin{itemize}
		\item Number of Adders = $n^2$
		\item Number of Multipliers = $n$
		\item Size of the Cache Shift Register = $n^2$
		\item Multiplexer would have $n$ inputs and $1$ output.
		\item Select signal from the Output Controller would be 3 bits for 5x5 and 7x7, and 4 bits for 9x9 filter.
	\end{itemize}	
	
	In addition the synchronization of the Input and the Output Controllers would change due to the number of clock cycles required to get $n$ new pixels from the memory and compute the output for $n$ pixels at a time. This means the cases described in section \ref{sec:memreadwrite} would now be the following for the $n*n$ filter:
		\begin{itemize}
		\item Number of cycles taken for the Input Controller to read pixels due to the filter movement (horizontal or vertical): $n$.
		\item Number of cycles taken for the Output Controller to read and write $n$ new pixels to the output memory: $3*n$, since there are three states for every pixel, namely, read, idle and write.
		\item Number of cycles for which the Output Controller waits in the case when the horizontal movement shifts to the next row or the vertical movement shifts to the next column: [($n$ + $3*n$) + ($n$ + $3*n$) + .. $(n-1) times$ .. + ($n$ + $3*n$)] + $n$, which is $4*n^2 - 3*n$.
	\end{itemize}	

	
	
\item It is assumed that the filter is symmetric along the two dimensional x and y axes. We need this since the indices of the filter which need to be multiplied with the image pixels would change in horizontal and vertical movements. For simplicity therefore, we have made this assumption.
The solution to this problem is quite simple though. We just need to have separate caches in the processor which hold the filter values in a different order. For horizontal movement we would use one cache, while the other one would be used for the vertical movement.

\item The mechanism which we have designed for the accessing the memory is ofcourse not the best way. Since we began the implementation with the sequence explained in the section \ref{sec:memoryaccess}, we did not change it later. Though, we realized that this is not an efficient way, since it consumes a high number of clock cycles in order to run through the entire image of 256x256 pixels.
\end{enumerate}	

\newpage
\section{VHDL Implementation Files}
\label{section:impl}
Following are the VHDL files which are core to the project. In addition we have a lot of test benches, which we have created to test the individual components. These extra files (along with the core files) are provided in the ZIP archive. 

\lstinputlisting[frame=trbl,caption={Memory.vhd}]{../code/Memory.vhd}
\newpage
\lstinputlisting[frame=trbl, caption={FSM\_in.vhd}]{../code/FSM_in.vhd}
\newpage
\lstinputlisting[frame=trbl, caption={FSM\_out.vhd}]{../code/FSM_out.vhd}

\lstinputlisting[frame=trbl,caption={REG.vhd}]{../code/REG.vhd}
\newpage
\lstinputlisting[frame=trbl,caption={SHIFTREG.vhd}]{../code/SHIFTREG.vhd}
\newpage
\lstinputlisting[frame=trbl, caption={CRA\_15.vhd}]{../code/CRA_15.vhd}
\newpage
\lstinputlisting[frame=trbl, caption={CSA\_15bit.vhd}]{../code/CSA_15bit.vhd}
\newpage
\lstinputlisting[frame=trbl,caption={parcial.vhd}]{../code/parcial.vhd}
\newpage
\lstinputlisting[frame=trbl,caption={Multiplier.vhd}]{../code/Multiplier.vhd}
\newpage
\lstinputlisting[frame=trbl, caption={Adder\_2.vhd}]{../code/Adder_2.vhd}
\newpage
\lstinputlisting[frame=trbl, caption={Adder\_3.vhd}]{../code/Adder_3.vhd}
\newpage
\lstinputlisting[frame=trbl, caption={MUX\_4.vhd}]{../code/MUX_4.vhd}
\newpage
\lstinputlisting[frame=trbl,caption={Processor\_3.vhd}]{../code/Processor_3.vhd}
\newpage
\lstinputlisting[frame=trbl,caption={TB\_filter.vhd}]{../code/TB_filter.vhd}

\newpage
Following are the Reports from Synopsys.
\lstinputlisting[frame=trbl, caption={Area 2 ns}]{../Reports/area2ns.txt}
\newpage
\lstinputlisting[frame=trbl, caption={Area 3 ns}]{../Reports/area3ns.txt}
\newpage
\lstinputlisting[frame=trbl, caption={Area 5 ns}]{../Reports/area5ns.txt}
\newpage
\lstinputlisting[frame=trbl, caption={Area 7 ns}]{../Reports/area7ns.txt}
\newpage
\lstinputlisting[frame=trbl, caption={Total Power 2 ns}]{../Reports/power2ns.txt}
\newpage
\lstinputlisting[frame=trbl, caption={Total Power 3 ns}]{../Reports/power3ns.txt}
\newpage
\lstinputlisting[frame=trbl, caption={Total Power 5 ns}]{../Reports/power5ns.txt}
\newpage
\lstinputlisting[frame=trbl, caption={Total Power 7 ns}]{../Reports/power7ns.txt}
\newpage
\lstinputlisting[frame=trbl, caption={Power Breakdown 2 ns}]{../Reports/power_breakdown2ns.txt}
\newpage
\lstinputlisting[frame=trbl, caption={Power Breakdown 3 ns}]{../Reports/power_breakdown3ns.txt}
\newpage
\lstinputlisting[frame=trbl, caption={Power Breakdown 5 ns}]{../Reports/power_breakdown5ns.txt}
\newpage
\lstinputlisting[frame=trbl, caption={Power Breakdown 7 ns}]{../Reports/power_breakdown7ns.txt}
\newpage
\lstinputlisting[frame=trbl, caption={Timing 2 ns}]{../Reports/timing2ns.txt}
\newpage
\lstinputlisting[frame=trbl, caption={Timing 3 ns}]{../Reports/timing3ns.txt}
\newpage
\lstinputlisting[frame=trbl, caption={Timing 5 ns}]{../Reports/timing5ns.txt}
\newpage
\lstinputlisting[frame=trbl, caption={Timing 7 ns}]{../Reports/timing7ns.txt}

\end{document}